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Inductive  Information  Retrieval  Using  Parallel  Distributed  Computation 


Massively  parallel  models  of  computation  have  shown  promising  results  in  the  areas  of 
pattern  recognition,  memory,  learning,  and  language  comprehension  (Anderson  &  Hinton, 
1961;  Feldman  &  Ballard,  1982;  McClelland  &  Rumelhart,  1981;  McClelland,  Rumelhart,  & 
Hinton,  in  preparation).  These  models,  called  comectiomst  or  parallel  distributed  processing 
(. PDP )  models,  offer  a  new  approach  to  the  representation  and  manipulation  of  knowledge. 
In  this  approach,  computation  takes  place  through  the  simultaneous  interaction  of  many 
small  pieces  of  knowledge,  some  pieces  supporting  each  other  and  other  pieces  competing 
with  each  other.  PDP  models  have  demonstrated  the  kind  of  flexibility  and  computational 
power  that  characterize  cognition,  particularly  those  aspects  of  cognition  that  people  per¬ 
form  so  effortlessly  and  naturally.  The  PDP  approach  promises  to  be  particularly  useful  in 
areas  that  demand  flexible  infer encing  and  reasoning  with  incomplete  or  imprecise  informa¬ 
tion  (which  I  call  inductive  reasoning).  This  paper  reports  on  an  application  of  the  PDP 
approach  to  one  such  area,  information  retrieval. 

I  will  focus  on  information  retrieval  systems  used  in  bibliographic  search,  known  as 
document  retrieval  systems.  Although  document  retrieval  will  serve  as  the  primary  example 
of  this  paper,  the  discussion  generalizes  quite  readily  to  information  retrieval  of  all  sorts. 
Indeed,  the  PDP  conceptualization  has  been  applied  to  information  retrieval  tasks  as  diverse 
as  locating  files  and  specifying  commands  on  a  computer  system  (Greenspan  &  Smolensky, 
1964)  and  organizing  information  in  an  on-line  manual  (O’Malley,  Smolensky,  Bannon,  Con¬ 
way,  Graham,  Sokolov,  &  Monty,  1983). 

I  begin  with  a  discussion  of  document  retrieval  and  argue  that  inductive  reasoning  is 
useful  in  document  retrieval,  then  present  a  PDP  retrieval  model  and  examine  its  properties. 

Docnmeat  Retrieval 

Document  retrieval  systems  are  typically  used  to  search  for  books  or  articles  (hereafter, 
documents)  on  a  particular  topic.  To  aid  in  this  process,  each  document  is  labeled  by  a  set  of 
descriptor  terms  that  characterize  the  content  matter  of  the  document.  For  example,  a  book 
on  Pascal  compiler  design  might  be  identified  by  the  descriptors  PASCAL,  COMPILERS,  com¬ 
puting,  PROGRAMMING,  and  LANGUAGES.  Queries  to  such  a  system  often  take  the  form  of 
a  Boolean  expression  of  descriptors,  and  the  system  reports  all  documents  satisfying  the 
expression.  For  example,  one  might  request  all  documents  about  "PROGRAMMING  AND  (COM¬ 
PILERS  OR  INTERPRETERS).' 

There  are  two  difficult  problems  with  such  a  system.  First,  users  have  a  hard  time  accu¬ 
rately  specifying  the  information  they  are  seeking,  possibly  because  the  document  descriptors 
have  different  semantics  than  they  realize,  or  because  they  fail  to  include  relevant  descriptors 
in  the  query,  or  because  they  include  irrelevant  descriptors.  The  end  result  is  that  the  set  of 
descriptors  chosen  for  the  query  is  semantically  inaccurate  or  ill-specified.  The  second 
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problem  with  document  retrieval  systems  is  that  the  indexing  of  documents  is  itself  often 
inconsistent  and  incomplete.  Relevant  descriptors  are  sometimes  omitted  from  a  document, 
and  because  documents  are  added  to  the  collection  over  long  periods  of  time  by  many  indivi¬ 
duals,  indexing  is  often  inconsistent.  When  either  query  or  document  descriptors  are  faulty, 
traditional  retrieval  systems  generally  perform  poorly  because  the  retrieval  process  is  too 
literal  minded;  it  simply  matches  query  descriptors  against  document  descriptors. 

Rather  than  assuming  that  the  query  is  a  precise  specification  of  the  user’s  intentions 
and  that  the  document  descriptors  precisely  characterize  the  documents,  a  retrieval  system 
might  better  assume  that  there  is  some  uncertainty  in  the  query  and  document  descriptors 
and  that  the  descriptors  can  be  interpreted  in  a  loose  sense.  Under  these  assumptions,  the 
retrieval  process  must  be  considered  inductive,  not  deductive. 

What  sort  of  output  will  an  inductive  retrieval  system  produce?  The  deductive  Boolean 
retrieval  system  described  above  produced  a  set  of  documents  that  matched  the  query  exactly; 
all  other  documents  were  rejected.  However,  an  inductive  system  will,  by  its  very  nature, 
tend  to  produce  a  set  of  documents  that  match  the  query  to  varying  degrees.  Thus,  an 
inductive  system  will  require  a  procedure  for  assigning  a  quality  of  match,  or  relevance  meas¬ 
ure,  to  each  document  given  a  particular  query. 

There  are  a  variety  of  methods  in  the  information  retrieval  literature  for  assigning  a 
relevance  measure  to  a  document  (B&rtschi  A  Frei,  1962;  Book  stein,  1980;  Buell  A  Kraft,  1981; 
Salton,  Fox,  A  Wu,  1983).  Some  use  fuzzy  logics  to  evaluate  Boolean  queries;  others,  known 
as  vector  models,  treat  documents  and  queries  as  vectors  in  a  concept  space  of  large  dimen¬ 
sionality  and  then  compute  the  distance  between  vectors.  However,  all  of  these  methods 
base  their  judgments  of  relevance  solely  on  the  relationship  between  document  descriptors 
and  query  descriptors. 

On  first  thought,  there  doesn’t  seem  to  be  any  other  information  on  which  to  base  a 
judgment  of  relevance.  However,  the  PDP  retrieval  model  proposed  below  exploits  an  addi¬ 
tional  source  of  information:  the  internal  structure  of  the  database.  This  structure  allows 
the  model  to  dynamically  determine  the  relationship  between  two  documents  or  two  descrip- 
ton,  and  to  use  this  information  in  computing  a  relevance  measure. 

The  PDP  Retrieval  Model 

To  understand  the  retrieval  model,  it  may  be  useful  first  to  outline  the  class  of  PDP 
models.  These  models  consist  of  a  large  numbeT  of  simple  processing  units  operating  in 
parallel.  In  most  cases,  each  unit  represents  a  possible  hypothesis.  Units  have  varying 
degrees  of  confidence  in  the  truth  of  their  hypotheses.  The  degree  of  confidence  is  quanti¬ 
fied  by  an  internal  state  variable  of  the  unit,  its  activation  level.  Units  can  transmit  their 
activation  levels  to  one  another  through  connecting  links.  There  are  two  types  of  links: 
excitatory  and  inhibitory.  When  two  units  represent  mutually  compatible  hypotheses,  they 
will  be  connected  by  an  excitatory  link.  Excitatory  links  cause  the  confidence  in  one 
hypothesis  to  increase  the  confidence  in  the  other  hypothesis.  When  two  units  represent 
mutually  incompatible  hypotheses,  they  will  be  connected  by  an  inhibitory  link.  Inhibitory 
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links  cause  the  confidence  in  one  hypothesis  to  decrease  the  confidence  in  the  other 
hypothesis.  The  outcome  of  any  computation  in  a  PDP  model  is  thus  the  result  of  coopera¬ 
tion  and  competition  among  a  large  number  of  simple  processes. 

In  the  PDP  retrieval  model,  each  document  and  each  descriptor  is  represented  by  a 
unit.  Figure  1  shows  a  portion  of  a  PDP  system,  with  the  upper  row  of  units  representing 
documents  and  the  lower  row  descriptors.  The  activation  level  of  a  document  unit  indicates 
the  system’s  belief  in  the  relevance  of  the  document,  i.e.,  the  relevance  measure.  Large 
activation  levels  indicate  a  high  degree  of  relevance;  low  or  negative  activation  levels  indicate 
irrelevance.  The  system  decides  on  the  relevance  of  descriptors  as  well  as  documents.  Like 
the  documents,  the  relevance  of  a  descriptor  is  indicated  by  its  activation  level. 

Links  connecting  units  permit  the  flow  of  activation.  Mutually  excitatory  links  connect 
each  document  and  all  descriptors  associated  with  the  document.  Thus,  evidence  for  a  par¬ 
ticular  descriptor  is  evidence  for  all  documents  associated  with  it;  and  evidence  for  a  particu¬ 
lar  document  is  evidence  for  all  of  its  descriptors.  Furthermore,  there  are  mutually  inhibi¬ 
tory  links  between  every  pair  of  documents.  Thus,  evidence  for  one  document  is  counterevi¬ 
dence  for  all  others. 

The  dynamics  of  the  model  are  based  on  McClelland  and  Rumelhart’s  (1981;  Rumelhart 
A  McClelland,  1982)  interactive  activation  model  of  word  perception.  A  formal  statement  of 
the  activation  rules  and  parameter  values  used  to  implement  the  model  are  included  in  the 
Appendix.  In  simulating  the  model,  time  is  quantized  into  discrete  steps,  and  the  following 
sequence  of  events  occurs  during  each  time  step.  If  a  unit  has  a  positive  activation  level,  it 
passes  its  activation  through  each  of  its  links;  otherwise,  no  activation  is  passed.  Each  unit 
computes  the  sum  of  its  incoming  activations,  weighted  by  connection  strengths  associated 
with  each  link.  This  net  input,  modulated  by  the  current  unit  activity  (in  order  to  prevent 
the  unit’s  activity  from  exceeding  a  certain  maximum  or  minimum  level),  is  added  to  the 
current  activity.  Finally,  a  unit  loses  a  fixed  percentage  of  its  activation  during  each  time 
step,  resulting  in  an  exponential  decay  of  activation  over  time.  The  system  stabilizes  when 
the  net  increase  in  activation  to  each  unit  equals  the  net  decrease,  that  is,  when  the  excita¬ 
tory  input  exactly  matches  the  combination  of  inhibitory  input  and  decay.  In  the  implemen¬ 
tation  to  be  described,  the  system  approached  equilibrium  within  about  25  time  steps. 

Querying  the  model  involves  activating  a  set  of  descriptor  units  and  seeing  which  docu¬ 
ment  units  become  active  as  a  result.  In  contrast  to  the  Boolean  queries  described  earlier, 
the  PDP-model  query  merely  specifies  a  set  of  relevant  descriptors.  The  set  includes  both 
positive  and  negative  descriptors,  the  positive  descriptors  being  those  that  should  be  associ¬ 
ated  with  the  retrieved  documents  and  the  negative  descriptors  those  that  should  not.  The 
activation  levels  of  the  positive-descriptor  units  are  clamped  to  the  maximum  allowed  level 
and  the  negative-descriptor  units  to  the  minimum  allowed  level.  The  activation  levels  of 
these  units  remain  clamped  throughout  the  course  of  processing  and  are  not  affected  by 
decay  or  incoming  activations. 
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At  an  abstract  level  of  description,  the  system  operates  as  follows.  The  user  activates  a 
set  of  descriptors.  These  descriptors  activate  a  set  of  documents.  The  documents  in  turn 
activate  new  descriptors,  which  will  activate  some  new  documents  as  well  as  reinforce  the 
activation  of  already  active  documents,  and  so  on.  Activation  continually  flows  from 
descriptors  to  documents  and  vice-versa.  This  flow  of  activation  allows  descriptors  in  a 
query  to  indirectly  suggest  other  descriptors  that  may  be  useful  in  the  document  search,  and 
it  allows  active  documents  to  indirectly  suggest  other  documents. 

Implementation  and  Testing  of  the  Model 

A  retrieval  system  was  developed  that  constructs  a  PDP  network  representation  of  a 
database  given  to  it,  allows  the  user  to  activate  sets  of  descriptor  units,  and  displays  docu¬ 
ments  ranked  by  activation  level.  The  system  includes  a  graphics  interface  that  shows  the 
activity  of  all  documents  and  descriptors. 

The  document  collection  used  to  examine  the  system  consisted  of  books  and  articles 
belonging  to  a  graduate  student  in  our  lab.  The  collection  had  407  documents,  was  indexed 
by  133  descriptors,  and  had  an  average  of  3.4  descriptors  per  document.  It  had  been  built  up 
over  the  course  of  several  years.  Indexing  of  the  documents  had  been  performed  as  the  docu¬ 
ments  were  added.  Consequently,  the  indexing  was  somewhat  inconsistent.  All  retrieval 
examples  given  in  the  next  section  come  from  this  collection. 

Properties  of  the  Model 

Compensation  for  Inaccuracy  and  Incompleteness  in  the  Query 

During  the  course  of  processing,  an  active  descriptor  may  cause  other  descriptors  to 
become  active.  The  retrieved  set  of  documents  will  be  ones  that  match  either  the  descriptors 
that  were  initially  active  (the  query  descriptors)  or  the  internally  activated  descriptors  (the 
induced  descriptors).  For  example,  consider  what  happens  when  the  descriptor  LINGUISTICS 
in  Figure  1  is  activated.  Documents  48,  49,  and  SO  will  become  active,  and  each  will  begin 
activating  its  own  descriptors;  LANGUAGES  will  thus  be  activated  by  documents  49  and  SO. 
Next,  LANGUAGES  will  activate  its  documents,  sending  some  activation  back  to  documents  49 
and  SO,  but  also  activating  documents  SI  and  S2.  Thus,  documents  SI  and  52  become  active 
not  because  their  descriptors  were  specified  in  the  query,  but  because  the  system  saw  one  of 
their  descriptors  as  being  related  to  descriptors  in  the  query.  In  the  document  collection 
used  as  a  testbed,  the  descriptors  LOGIC  and  REASONING  also  became  slightly  active  after 
LANGUAGES  did,  presumably  because  these  two  descriptors  are  related  to  LANGUAGES  in  the 
same  way  that  LANGUAGES  is  related  to  LINGUISTICS.  In  essence,  the  system  has  added 
LANGUAGES,  and  to  a  lesser  extent  LOGIC  and  REASONING,  to  the  list  of  query  descriptors. 

One  descriptor  tends  to  activate  another  to  the  extent  that  the  two  descriptors  co¬ 
occur  in  the  currently  active  subset  of  the  document  collection.  The  tendency  for  one 
descriptor  to  activate  another  is  thus  influenced  by  the  other  descriptors  that  are  active, 
because  the  other  active  descriptors  affect  which  subset  of  the  collection  is  active.  As  an 
example  of  this  context  dependency,  if  languages  is  activated  along  with  computers,  the 
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descriptors  PROGRAMMING,  AI,  and  DESIGN  become  strongly  active,  whereas  if  LANGUAGES  is 
activated  along  with  PSYCHOLOGY,  then  cognitive  and  linguistics  become  active. 

Clearly,  the  model  achieves  a  flexible  interpretation  of  the  query.  The  query  is  used  as 
a  guide  in  the  retrieval  process,  steering  the  model’s  attention  in  certain  directions,  not  as  a 
rigid  criterion  that  must  be  matched  exactly.  This  flexibility  allows  the  system  to  help  com¬ 
pensate  for  inaccurate  and  incomplete  queries. 

Compensation  for  Inconsistency  and  Incompleteness  in  the  Indexing  of  the  Document  Col¬ 
lection 

When  a  document  has  many  descriptors  in  common  with  the  set  of  active  documents, 
the  document  is  likely  to  become  active  itself,  even  if  it  doesn’t  match  the  query  very  well. 
This  property  is  quite  useful  if  descriptors  are  accidentally  omitted  in  the  indexing  of  a  docu¬ 
ment.  For  example,  one  might  suppose  that  documents  51  and  52  in  Figure  1  should  have 
been  indexed  with  the  descriptor  LINGUISTICS,  because  many  items  that  concern  LANGUAGES 
also  concern  LINGUISTICS.  The  system  will  make  precisely  this  inference  when  linguistics  is 
activated  via  the  induced  descriptor  mechanism  described  above. 

Thus,  the  system  can  help  to  overcome  inconsistency  and  incompleteness  in  the  index¬ 
ing  of  the  collection.  Of  course,  the  performance  of  the  system  degrades  with  a  degradation 
in  indexing,  but  because  of  the  system’s  statistical  nature,  the  degradation  in  performance  is 
gradual. 

Ranking  of  the  Retrieved  Documents 

The  activation  level  of  a  document  is  a  measure  of  its  relevance  according  to  the  model, 
with  the  most  active  documents  being  most  relevant.  Thus,  ranking  the  retrieved  documents 
according  to  their  activation  levels  will  produce  a  set  ordered  by  relevance.  The  ordered  set 
is  quite  useful  when  many  documents  are  retrieved,  because  it  suggests  which  documents  to 
inspect  first. 

Let’s  examine  what  the  ranked  set  of  documents  will  look  like.  With  the  parameter 
values  indicated  in  the  Appendix,  there  will  be  a  very  strong  tendency  for  documents  match¬ 
ing  the  query  exactly  to  be  most  active,  followed  by  partial  matches,  followed  by  documents 
having  no  descriptors  in  common  with  the  query  but  having  at  least  one  active  (induced) 
descriptor.  Conventional  relevance  measures  cannot  distinguish  among  the  exact  matches, 
and  can  make  only  coarse-grain  distinctions  among  the  partial  matches.  This  is  because  these 
relevance  measures  are  based  solely  on  the  query  descriptors.  However,  the  PDP  model  also 
uses  the  induced  descriptors  to  form  its  relevance  measure.  Thus,  two  documents  that  share 
the  same  queiy  descriptors  may  nonetheless  be  assigned  different  relevance  measures. 

Consider  what  happens  when  several  documents  in  the  collection  match  the  query 
exactly,  as  when  LANGUAGES  is  queried  in  Figure  1.  Documents  49  through  52  will  immedi¬ 
ately  become  active,  all  to  the  same  extent.  Documents  49  and  50  will  then  support  linguis¬ 
tics,  and  documents  50  and  52  communication.  These  two  descriptors  will  in  turn  support 


Mozer 


8 


Inductive  Information  Retrieval 


their  respective  documents,  with  document  SO  receiving  support  from  both.  Thus,  document 
SO  becomes  the  most  active. 

The  most  active  document  within  the  set  of  exact  matches  is  the  one  with  the  greatest 
number  of  highly  active  induced  descriptors;  documents  with  successively  less  activation  have 
fewer  such  descriptors.  Because  the  induced  descriptors  are  derived  from  their  association 
with  many  of  the  active  documents,  one  may  conclude  that  the  most  active  documents  are 
those  that  share  many  descriptors  with  the  other  active  documents.  In  this  sense,  the  most 
active  documents  are  those  most  representative  or  prototypical  of  the  retrieved  set.  A 
representative  document  may  serve  as  a  useful  example  if  a  query  produces  a  large  retrieval 
set. 

Consider  now  the  meaning  of  representativeness  when  no  document  matches  the  o  y 
exactly  but  there  are  many  partial  matches.  For  example,  this  is  the  case  in  the  test  r  ec- 
tion  when  a  query  is  made  for  LEARNING  and  PROGRAMMING.  Documents  indexed  I  he 

descriptor  pairs  LEARNING  and  COMPUTERS,  or  PROGRAMMING  and  PSYCHOLOGY,  e  up 

with  higher  activation  levels  than  those  indexed  by  LEARNING  but  not  COMPUTERS,  c  * 
CRAMMING  but  not  PSYCHOLOGY.  The  first  subset  is  most  representative  of  the  retriev  1  a 
as  a  whole,  and  is  clearly  a  better  match  to  the  query  than  the  second.  However,  conven¬ 
tional  relevance  measures  are  unable  to  make  the  distinction  between  these  two  subsets. 

Providing  Cues  for  Continued  Search 

Information  retrieval  can  be  thought  of  as  an  iterative  process  (Tou,  Williams,  Fikes, 
Henderson,  A  Malone,  1982).  The  user  first  formulates  a  rough  query,  and  in  successive 
retrieval  attempts  the  query  is  reformulated.  The  induced  descriptors  may  be  helpful  in 
reformulating  the  query.  Because  these  descriptors  are  associated  with  many  of  the  retrieved 
documents,  they  are  useful  for  partitioning  the  retrieval  set.  That  is,  if  the  user  can  state 
that  a  certain  descriptor  should  or  should  not  be  present  among  the  retrieved  documents, 
the  size  of  the  retrieval  set  can  be  reduced.  The  descriptors  carrying  the  most  information 
are  those  associated  with  exactly  half  the  retrieval  set.  Roughly,  this  is  characteristic  of  the 
induced  descriptors  with  highest  activity. 

Retrieval  by  Example 

A  query  may  be  formulated  using  documents  instead  of  descriptors,  thus  allowing  the 
user  to  request  documents  that  are  similar  to  a  given  set  of  documents.  Similarity  is  meas¬ 
ured  in  terms  of  common  descriptors.  For  example,  if  documents  47  and  SO  are  activated  in 
Figure  1,  document  48  is  also  likely  to  become  activated  via  the  descriptors  it  shares  with 
documents  47  and  SO. 
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Parameters  of  the  Model 

The  model  has  over  a  dozen  parameters  (some  of  which  have  been  removed  from  the 
Appendix  to  simplify  the  presentation),  many  more  if  one  considers  varying  connection 
strengths  for  each  link.  Overall,  it  was  surprising  how  robust  the  basic  properties  of  the 
model  were  under  variation  in  these  parameters.  However,  several  parameters  are  useful  in 
controlling  the  behavior  of  the  model.  These  parameters  affect  the  model’s  behavior  as  fol¬ 
lows. 

Document-Document  Inhibition  fwDiD()  and  Descriptor  Decay  Rate  (8D) 

These  parameters  control  how  freely  the  model  associates.  If  wDtDl  or  0O  are  large, 
activation  cannot  easily  be  passed  through  the  system,  and  the  model  will  tend  to  retrieve 
only  documents  indexed  by  one  or  more  of  the  query  descriptors;  if  sufficiently  large,  the 
model  will  behave  exactly  like  a  vector  model  (described  in  the  section  on  Document 
Retrieval  and  in  Salton,  Fox,  &  Wu,  1983).  Conversely,  if  these  parameters  are  small,  positive 
feedback  reverberates  throughout  the  model  and  it  "hallucinates*;  internal  activations  begin 
to  dominate  external  (query)  activations.  To  avoid  hallucinatory  behavior,  these  parameters 
are  set  so  that  induced  descriptors  never  approach  the  activation  level  of  the  query  descrip¬ 
tors. 

Document-to-Descriptor  and  Descriptor -to-Document  Connection  Strengths  and  wi  D.) 

These  parameters  control  the  strength  of  association  between  a  document  and  a 
descriptor.  Some  descriptors  may  be  more  important  in  characterizing  a  document  than  oth¬ 
ers,  and  this  could  be  reflected  in  the  connection  strengths. 

Resting  Activation  Levels 

In  the  current  implementation,  all  document  units  have  resting  activation  levels  of  zero, 
that  is,  their  activity  decays  back  to  zero.  The  resting  activation  levels  can  be  adjusted,  how¬ 
ever,  to  bias  retrieval  in  favor  of  certain  documents.  Suppose  that  the  resting  activation  level 
of  a  document  corresponded  to  its  frequency  of  activation  over  long  periods  of  time.  Then 
documents  that  a  user  retrieved  often  would  be  retrieved  more  readily. 

Query  Descriptor  Activation 

Query  descriptors  can  be  assigned  different  levels  of  activation  to  vary  their  relative 
importance  in  the  query.  The  activation  levels  can  be  set  automatically  (e.g.,  using  inverse 
document  frequency,  as  described  by  Sparck  Jones,  1973)  or  directly  by  the  user. 
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Document  and  Descriptor  Fan-in  Exponents  (ao  and  ad) 

These  parameters  help  compensate  for  a  bias  built  into  the  system.  To  see  the  bias, 
consider  a  query  of  MODELING  in  Figure  1.  Documents  47  and  48  will  become  activated,  fol¬ 
lowed  by  descriptors  COMPUTERS,  PSYCHOLOGY,  and  LINGUISTICS.  A  positive  feedback  loop 
is  formed  between  these  documents  and  their  associated  descriptors.  Assuming  activation  of 
the  descriptors  from  other  sources  is  small,  document  47  will  become  slightly  less  active  than 
document  48,  simply  because  it  is  associated  with  fewer  descriptors.  Thus,  positive  feedback 
loops  bias  activation  in  favor  of  documents  associated  with  many  descriptors;  they  do  simi¬ 
larly  for  descriptors  associated  with  many  documents. 

One  solution  to  this  dilemma  is  to  replace  each  descriptor  unit  with  a  dipole,  one  unit 
of  the  dipole  representing  the  descriptor  and  the  other  its  complement.  Connections  would 
be  made  between  each  document  and  one  of  the  two  dipole  units.  Thus,  all  descriptors 
would  receive  an  equal  number  of  inputs,  as  would  all  dipoles,  causing  the  bias  to  disappear. 
However,  this  solution  requires  a  massive  interconnection  of  documents  and  dipoles. 

An  inexact  but  practical  solution  is  to  base  the  connection  strength  of  links  coming 
into  a  unit  on  the  number  of  incoming  links.  The  more  links  a  unit  has,  the  less  activity 
each  link  can  provide.  This  solution  has  been  implemented  using  aa  and  ad .  These  parame¬ 
ters  were  adjusted  until  the  bias  was  approximately  nullified. 

Evaluating  the  Model 

The  best  support  for  the  model  lies  in  its  ability  to  produce  surprising  results,  often 
retrieving  documents  that  have  no  descriptors  in  common  with  the  query  yet  are  clearly 
relevant.  These  are  documents  that  no  conventional  deductive  retrieval  system  could  find.  It 
is  difficult  to  draw  firm  conclusions  as  to  the  model’s  utility,  because  systematic  data 
evaluating  the  model’s  performance  has  yet  to  be  collected.  However,  the  model  has  been 
tested  on  two  other  databases,  one  of  operating-system  commands  and  the  other  of  local  res¬ 
taurants,  1  and  the  model  performs  well  on  both. 

Several  drawbacks  of  the  model  should  be  noted.  First,  queries  lack  the  expressive 
power  of  a  Boolean  query  formulation;  no  distinction  is  made  between  'and*  and  'or.*  This 
problem  is  inherent  in  all  vector  models  (Salton,  Fox,  &  Wu,  1983).  Second,  running  the 
model  on  serial  hardware  in  real  time  with  a  large  database  may  be  difficult.  Third,  the 
model  requires  that  the  documents  be  indexed  by  a  highly  overlapping  and  preferably  corre¬ 
lated  set  of  descriptors.  Without  such  an  indexing  scheme,  there  is  no  internal  structure  to 


1.  In  the  restaurant  database,  restaurants  were  described  by  their  nationality,  location,  and  cost.  Because  restau¬ 
rants  have  only  one  value  for  each  of  these  parameters,  the  co-occurrence  of  descriptors  could  not  be  used  to 
infer,  say,  that  Greek  restaurants  are  somewhat  similar  to  Armenian  restaurants,  or  that  a  restaurant  in  north 
park  is  similar  to  one  in  hillcrest.  Consequently,  it  was  necessary  to  specify  the  semantics  of  the  descriptors 
explicitly.  Descriptor  semantics  were  built  into  the  PDF  system  by  linking  related  descriptors  with  a  strength  of 
connection  corresponding  to  the  degree  of  sssociation  of  the  descriptors.  For  example,  the  Greek  descriptor 
unit  was  linked  to  the  Armenian  unit,  but  not  to  the  Japanese  unit. 
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the  database,  and  the  model  performs  exactly  like  a  vector  model  (Salton,  Fox,  &  Wu,  1983). 

Conclusions 

This  paper  has  presented  a  new  area  of  application  for  parallel  distributed  computa- 
tion.  Although  evaluation  has  been  informal,  the  PDP  approach  to  information  retrieval 
seems  potentially  powerful  and  robust.  At  very  least,  the  PDP  approach  has  pointed  out 
that  the  internal  structure  of  a  database  can  be  a  useful  source  of  knowledge  in  retrieval.  A 
stronger  claim,  however,  is  that  the  PDP  approach  assists  users  of  a  retrieval  system  in  track* 
ing  down  relevant  information,  particularly  when  either  the  query  or  the  database  is  incom¬ 
plete  or  imprecise.  The  assistance  provided  is  threefold.  First,  a  PDP  retrieval  system  is  able 
to  flexibly  interpret  descriptors  in  the  query  and  the  database.  Second,  the  system  ranks 
retrieved  information  precisely  and  in  order  of  presumed  relevance  to  the  user.  Third,  the 
system  suggests  directions  that  users  may  take  to  further  specify  their  query. 
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Appendix  -  Summary  of  Activation  Rules 
and  Parameter  Values  of  the  Test  System 


The  activity  level  of  document  unit  i  at  time  t  +1  is  given  by 


D/('+l)  = 


(l-0o )  D,(t)  +^(0  (M  -Dt  (/ ))  if  no((/  )>  o 
(l-M  D,(t)  +  %>,(/)  (D,(0-m)  if  TU,,(/)S0 


where  i ru>,0)  is  the  net  input  to  document  i  at  time  /,  as  given  by 


%>,(')  =  (c0/cDl)aD^wdDiU(dj(i))  -  ^wokDjU(Dt(t)). 


j- 1 


*- 1 


The  activity  level  of  descriptor  unit  j  at  time  t  +1  is  given  by 


j(l-fc)  4(0 +  %>,(')  (*^4(0) 

(1-Orf)  d,{t)  +  T vy(r)  (dj(t)-m) 


if  i\dj(‘)>0 
if  V,(')s0 


where  (r )  is  the  net  input  to  descriptor  j  at  time  t ,  as  given  by 

H<(0  *  (ci  /<*)"'  £  WD(  v  U  (D,  (»  )). 

1-1 


U(x)  is  the  zero-threshold  identity  function: 
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All  other  quantities  are  parameters  of  the  document  collection  or  are  constants 
(the  values  in  parentheses  are  those  used  in  the  test  system): 

no  number  of  documents  in  the  collection 

*r  number  of  descriptors  in  the  collection 

w4j0,  strength  of  connection  from  descriptor  j  to  document  i 

(0  if  no  connection;  .10  otherwise) 

w0,4j  strength  of  connection  from  document  i  to  descriptor  j 

(0  if  no  connection;  .03  otherwise) 
wDk0 ,  strength  of  (inhibitory)  connection  from  document 

k  to  document  i  (0  if  k  «i ;  .0075  otherwise) 

04  descriptor  decay  rate  (.25) 

0O  document  decay  rate  (.10) 

M  maximum  activity  level  (1.0) 

m  minimum  activity  level  (-0.2) 

co,  number  of  descriptors  in  document  « 

dj  number  of  documents  indexed  under  descriptor  j 

ci>  average  number  of  descriptors  per  document  in  collection 

cr  average  number  of  documents  per  descriptor  in  collection 

a»  document  fan-in  exponent  (.10) 

aw  descriptor  fan-in  exponent  (.30) 
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